STIC Biotechnology Systems Branch 



RAW SEQUENCE LISTING 
ERROR REPORT 



The Biotechnology Systems Branch of the Scientific and Technical Information 
Center (STIC) detected errors when processing the following computer readable 
form: 



Application Serial Number: 
Source: 

Date Processed by STIC: 




THE ATTACHED PRINTOUT EXPLAINS DETECTED ERRORS. 

PLEASE FORWARD THIS INFORMATION TO THE APPLICANT BY EITHER: 

1) INCLUDING A COPY OF THIS PRINTOUT IN YOUR NEXT COMMUNICATION TO THE 
APPLICANT, WITH A NOTICE TO COMPLY or, 

2) TELEPHONING APPLICANT AND FAXING A COPY OF THIS PRINTOUT, WITH A 
NOTICE TO COMPLY 

FOR CRF SUBMISSION AND PATENTIN SOFTWARE QUESTIONS, PLEASE CONTACT 
MARK SPENCER, TELEPHONE: 571-272-2510; FAX: 571-273-0221 



TO REDUCE ERRORED SEQUENCE LISTINGS, PLEASE USE THE CHECKER 
VERSION 4.2.2 PROGRAM. ACCESSIBLE THROUGH THE U.S. PATENT AND 
TRADEMARK OFFICE WEBSITE. SEE BELOW FOR ADDRESS: 
http://www.uspto.gov/web/ofTices/pac/checker/chkrnote.htm 



Applicants submitting genetic sequence information electronically on diskette or CD-Rom should be aware thai there is 

a possibility that the disk/CD-Rom may have been affected by treatment given to all incoming mail. 

Please consider using alternate methods of submission for the disk/CD-Rom or replacement disk/CD-Rom. 

Any reply including a sequence listing in electronic form should NOT be sent to the 2023 1 zip c ode address for the 

United States Patent and Trademark Office, and instead should be sent via the following to the indic ated addresses: 

1. EFS-Bio (<http://www.uspto.gov/ebc/efs/downloads/documents.htm> , EFS Submission 

User Manual - ePAVE) 

2. U.S. Postal Service: Commissioner for Patents, P.O. Box 1450, Alexandria, VA 22313-1450 

3. Hand Carry, Federal Express, United Parcel Service, or other delivery service (EFFECTIVE 01/14/05): 
U.S. Patent and Trademark Office, Mail Stop Sequence, Customer Window, Randolph Building, 401 Dulany Street, 
Alexandria, VA 22314 

Revised 01/24/05 



Raw Sequence Listing Error Summary 



SUGGESTED CORRECTION SERIAL NUMBER: 

: PLEASE DISREGARD ENGLISH -ALPHA" HEADERS, WHICH WERE INSERTED BY PTO SOFTWARE 

The number/text at the end of each line "wrapped" down to the next line. This may occur if your file 
was retrieved in a word processor after creating it. Please adjust your right margin to .3; this will 
prevent •'wrapping." 



TRROR DETECTED 

ATTN: NEW RULES CASES 

I Wrapped Nuclcics 

Wrapped Aminos 



Jnvalid Line Length The rules require that a line not exceed 72 characters in length. This includes white^spjecs. 



Misaligned Amino 
Numbering 

Non-ASCII 



Variable Length 



Patentln 2.0 
" "bug" 



7 Skipped Sequences 

(OLD RULES) 



Skipped Sequences 
"(NEW RULES) 



10 



Use of n'sor Xaa's 
{NEW RULES) 

Invalid <2I3> 
Response - 



The numbering under each 5* amino acid is misaligned. Do not use tab codes between numbers; 
use space characters, instead. 

The submitted file was not saved in ASCII(DOS) text, as required by the Sequence Rules. Please 
ensure your subsequent submission Is saved In ASCII text. 

Sequencc(s) contain n's or Xaa's representing more than one residue. Per Sequence Rules, 
each n or XaaTan only represent a single residue. Please present the maximum number of each 
residue having variable length and indicate in the <220>-<223> section that some may be missing. 

A "bug" in Patentln version 2.0 has caused.the <220>-<223> section to be missing from amino acid 

sequencers) . Normally, Patentln would automatically generate this section from the 

previously coded nucleic acid sequence. Please manually copy the relevant <220>-<223> section to 
the subsequent amino acid sequence. This applies to the mandatory <220>-<223> sections for 
Artificial or Unknown sequences. 

Sequences) missing. If intentional, please insert ihe following lines for each skipped sequence: 

(2) INFORMATION FOR SEQ ID NO:X: (insert SEQ ID NO where "X" is shown) 
(i) SEQUENCE CHARACTERISTICS: (Do not insert any subheadings under this heading) 
(xi) SEQUENCE DESCRIPTION SEQ ID NO:X: (insert SEQ ID NO where "X" is shown) 
This sequence is intentionally skipped 

Please also adjust the "(ii) NUMBER OF SEQUENCES." response to Include the skipped sequences. 

Sequence^) missing. If intentional, please insert the following lines for each skipped sequence 

<210> sequence id number 
<400> sequence id number 
000 

Use of n's and/or Xaa's have been detected in the Sequence Listing. 

Per I 823 of Sequence Rules, use of <220>-<223> is MANDATORY if n's or Xaa's arc present. 

In <220> to <223> section, please explain location of n or Xaa, and which residue n or Xaa represents. 

Per I 823 of Sequence Rules, the only valid <2I3> responses are: Unknown, Artificial Sequence, or 
scientific name (Genus/species) <220>-<223> section is required when <2I3> response is Unknown or 
is Artificial Sequence*^ 



II 



Use of <220> ^Se^ucncc(s) missing the <220> "Feature" and associated numeric identifiers and responses. 



12 



Patentln 2.0 
"bug" 



Sequcncc(s) missing me ^zzu^ i^iu,^ — — ?- • - - ■ m 

Use of <220> to <223> is MANDATORY if <2I3> "Organism" response is 4 Artificial Sequence or 

"Unknown. " Please explain source of genetic material in <220> to <223> section, 

(See "Federal Register, 06/0 1 / 1 998, Vol. 63. No. 104, pp. 29631-32) (Sec. 1.823 of Sequence Rules) 

Please do not use "Copy to Disk" function of Patentln version 2.0. This causes a corrupted file, 
resulting in missing mandatory numeric identifiers and responses (as indicated on raw sequence 
listing) Instead please use "File Manager" or any other manual means to copy file to floppy disk. 



1 3 Misuse of n/Xaa V can only represent a single nucleotide ; "Xaa" can only represent a single amino acid 

AMC - Biotechnology Systems Branch - 09/09/2003 



BEST AVAILABLE COPY 




RAW SEQUENCE LISTING DATE: 05/09/2005 

PATENT APPLICATION; US/10/507,257 TIME: 07:58:22 

Input Set : A:\3170.1006-000 Sequence List. TXT 
Output Set: N:\CRF4\05092005\J507257.raw 

3 <110> APPLICANT: WILKINS, Marc 

4 ARTHUR, Jonathon W 

6 <120> TITLE OF INVENTION: Annotation of genome sequences 

8 <130> FILE REFERENCE: 3170.1006.000 

10 <140> CURRENT APPLICATION NUMBER: US 10/507,257 

C--> 11 <141> CURRENT FILING DATE: 2004-09-10 

13 <150> PRIOR APPLICATION NUMBER: AU PS1118 

14 <151> PRIOR FILING DATE: 2002-03-13 Jhf*-^^® I 
16 <150> PRIOR APPLICATION NUMBER: 



PCT/AU0 3/00300 "^^Cf^ qPJ^ ^Offy)/^ 



17 <151> PRIOR FILING DATE: 2003-03-13 



19 <160> NUMBER OF SEQ ID NOS : 108 



21 <170> SOFTWARE: Patentln version I ^ 



ft 



23 <210> SEQ ID NO: 1 

24 <211> LENGTH: 349 

25 <212> TYPE: PRT 

26 <213> ORGANISM: M. tuberculosis h37RV segment bp 420001 to 421050 
28 <400> SEQUENCE: 1 



30 


Gin 


Ala 


Val 


Thr 


Asn 


Val 


Asp 


Arg 


Thr 


Val 


Arg 


Ser 


Val 


Lys 


Arg 


His 


31 


1 








5 










10 










15 




34 


Met 


Gly 


Ser 


Asp 


Trp 


Ser 


He 


Glu 


He 


Asp 


Gly 


Lys 


Lys 


Tyr 


Thr 


Ala 


35 








20 










25 










30 






38 


Pro 


Glu 


He 


Ser 


Ala 


Arg 


He 


Leu 


Met 


Lys 


Leu 


Lys 


Arg 


Asp 


Ala 


Glu 


39 






35 










40 










45 








42 


Ala 


Tyr 


Leu 


Gly 


Glu 


Asp 


He 


Thr 


Asp 


Ala 


Val 


lie 


Thr 


Thr 


Pro 


Ala 


43 




50 










55 










60 










46 


Tyr 


Phe 


Asn 


Asp 


Ala 


Gin 


Arg 


Gin 


Ala 


Thr 


Lys 


Asp 


Ala 


Gly 


Gin 


He 


47 


65 










70 










75 










80 


50 


Ala 


Gly 


Leu 


Asn 


Val 


Leu 


Arg 


lie 


Val 


Asn 


Glu 


Pro 


Thr 


Ala 


Ala 


Ala 


51 










85 










90 










95 




54 


Leu 


Ala 


Tyr 


Gly 


Leu 


Asp 


Lys 


Gly 


Glu 


Lys 


Glu 


Gin 


Arg 


He 


Leu 


Val 


55 








100 










105 










110 






58 


Phe 


Asp 


Leu 


Gly 


Gly 


Gly 


Thr 


Phe 


Asp 


Val 


Ser 


Leu 


Leu 


Glu 


He 


Gly 


59 






115 










120 










125 








62 


Glu 


Gly 


Val 


Val 


Glu 


Val 


Arg 


Ala 


Thr 


Ser 


Gly 


Asp 


Asn 


His 


Leu 


Gly 


63 




130 










135 










140 










66 


Gly 


Asp 


Asp 


Trp 


Asp 


Gin 


Arg 


Val 


Val 


Asp 


Trp 


Leu 


Val 


Asp 


Lys 


Phe 


67 


145 










150 










155 










160 


70 


Lys 


Gly 


Thr 


Ser 


Gly 


He 


Asp 


Leu 


Thr 


Lys 


Asp 


Lys 


Met 


Ala 


Met 


Gin 


71 










165 










170 










175 




74 


Arg 


Leu 


Arg 


Glu 


Ala 


Ala 


Glu 


Lys 


Ala 


Lys 


He 


Glu 


Leu 


Ser 


Ser 


Ser 


75 








180 










185 










190 






78 


Gin 


Ser 


Thr 


Ser 


He 


Asn 


Leu 


Pro 


Tyr 


He 


Thr 


Val 


Asp 


Ala 


Asp 


Lys 


79 






195 










200 










205 









file://C:\CRF4\OUTHOLD\VsrJ507257.htm 
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RAW SEQUENCE LISTING 

PATENT APPLICATION: US/10/507,257 



DATE: 05/09/2005 
TIME: 07:58:22 



Input Set : A:\3170.1006-000 Sequence List. TXT 
Output Set: N:\CRF4\05092005\J507257.raw 



82 Asn Pro Leu Phe Leu Asp Glu Gin Leu Thr Arg Ala 

83 210 215 220 

86 lie Thr Gin Asp Leu Leu Asp Arg Thr Arg Lys Pro 

87 225 230 



Glu Phe Gin Arg 



90 
91 
94 
95 
98 
99 
102 
103 
106 



lie Ala Asp Thr Gly lie Ser Val Ser Glu 
245 250 
Val Gly Gly Ser Thr Arg Met Pro Ala Val 
260 265 



Lys 
235 
lie Asp 



Phe Gin Ser 



His Val 



Val 
255 
Lys 



Val 
240 
Leu 

Glu 



Leu Thr Gly Gly Lys 
275 

Val Ala Val Gly Ala 
290 

Lys Asp Val Leu Leu 



305 
Thr 



Lys 



Pro 



Gly Gly Val 
325 

Thr Lys Arg Ser 
340 

<210> SEQ ID NO: 2 
<211> LENGTH: 8 
<212> TYPE: PRT — 
<213> ORGANISM: ^peptide 



Glu Pro Asn 
280 

Ala Leu Gin Ala 
295 

Leu Asp Val Thr 
310 

Met Thr Arg Leu 



Thr Asp Leu Val 
270 

Lys Gly Val Asn Pro Asp Glu Val 
285 

Gly Val Leu Lys Gly Glu Val 
300 

Pro Leu Ser Leu Gly lie 



Glu Thr Phe 



Thr 
345 





107 
110 
111 
114 
115 
118 
119 
120 
121 

123 <400> SEQUENCE: Z 

125 lie Thr Gin Asp Leu Leu Asp Arg 

126 1 5 

129 <210> SEQ ID NO: 3 

130 <211> LENGTH: 8 

131 <212> TYPE: PRT 

132 <213> ORGANIS 
134 <400> SEQUENCE? 

136 Val Val Asp Trp Leu Val Asp Lys 

137 1 5 

140 <210> SEQ ID NO: 4 

141 <211> LENGTH: 9 

142 <212> TYPE: PRT 

143 <213> ORGANISM<^pepticie ID 19685, 
145 <400> SEQUENCE: 4 

147 Met Pro-Ala Val Thr Asp Leu Val Lys 

148 1 5 

151 <210> SEQ ID NO: 5 

152 <211> LENGTH: 9 

153 <212> TYPE: PRT 

154 <213> ORGANISM<^peptide ID 19659 
156 <400> SEQUENCE 

158 Tyr Thr Ala Pro Glu lie Ser Ala Arg 

159 1 5 

162 <210> SEQ ID NO: 6 

163 <211> LENGTH: 12 

164 <212> TYPE: PRT 



He 
330 
Thr 



Leu 
315 

Glu Arg Asn 
Ala Asp Asp 



Thr 



Thr 

335 



Glu 
320 
He 




Ms 



: 9 

PRT^^ 

3M<£peptide ID 19659^) 
:E: 5~ 
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RAW SEQUENCE LISTING 

PATENT APPLICATION: US/10/507,257 



DATE: 05/09/2005 
TIME: 07:58:22 



Input Set : A:\3170.1006-000 Sequence List. TXT 
Output Set: N:\CRF4\05092005\J507257.raw 



165 
167 



<213> 
<400> 



ORGANISM. 
SEQUENCE 




169 Asp Ala Gly Gin He Ala 



170 
173 
174 
175 
176 
178 



1 

<210> 
<211> 
<212> 
<213> 
<400> 



Gly Leu Asn Val Leu Arg 
10 



SEQ ID NO 
LENGTH: 11 
TYPE: PRT 
ORGANISM :\|>ep tide 
SEQUENCE: 7 



.1 

— — - > 

V^eptide ID 19680^) 



180 Asn Pro Leu Phe Leu Asp 



181 
184 
185 
186 
187 
189 
191 
192 
195 
196 
197 
198 
200 
202 
203 
206 
207 
208 
209 
211 
213 
214 
217 
218 
219 
220 
222 



1 

<210> 
<211> 
<212> 
<213> 
<400> 



Glu Gin Leu Thr Arg 
10 



SEQ ID NO: 
LENGTH: 13 
TYPE: PRT. 
ORGANIS] 
SEQUENCER 




His Met Gly Ser Asp Trp 



1 

<210> 
<211> 
<212> 
<213> 
<400> 



Ser He Glu He Asp Gly Lys 
10 



SEQ ID NO: 
LENGTH: 13 
TYPE: PRT 
ORGANISM: (peptide 
SEQUENCE : 



ID 1489.65071mod 




His Met Gly Ser Asp Trp 



1 

<210> 
<211> 
<212> 
<213> 
<400> 



Ser He Glu He Asp Gly Lys 
10 



SEQ ID NO: 10 
LENGTH: 16 
TYPE: PRT 
ORGANI SM \ pep t ide 
SEQUENCE: 1 




11 



He Val Asn Glu Pro Thr 
1 5 
<210> 
<211> 
<212> 
<213> 
<400> 



Ala Ala Ala Leu Ala Tyr Gly Leu Asp Lys 
10 15 



SEQ ID NO: 
LENGTH: 16 
TYPE : PRT^^^^ 
ORGANISM^ peptide ID 196 70 
SEQUENCE :*i-il 




224 Ala Thr Ser Gly Asp Asn 



225 
228 
229 
230 
231 
233 



1 

<210> 
<211> 
<212> 
<213> 
<400> 



His Leu Gly Gly Asp Asp Trp Asp Gin Arg 
10 15 



SEQ ID NO: 
LENGTH: 17 
TYPE: PR1 

ORGANI SM ksPept ide 
SEQUENCE : IT 




ID 



235 Asp Val Leu Leu Leu Asp 

236 1 5 
239 Lys 

243 <210> SEQ ID NO: 13 

244 <211> LENGTH: 20 



Val Thr Pro Leu Ser Leu Gly He Glu Thr 
10 15 
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RAW SEQUENCE LISTING DATE: 05/09/2005 

PATENT APPLICATION: US/10/507,257 TIME: 07:58:22 



Input Set : A:\3170.1006-000 Sequence List . TXT 
Output Set: N:\CRF4\05092005\J507257.raw 



245 
246 
248 
250 
251 
254 
255 




20 



258 <210> SEQ ID NO: 14 

259 <211> LENGTH: 51 

260 <212> TYPE: PRT 

261 <213> ORGANISM: Mycobacterium tuberculosis H37RV segment bp 3836701 to 3837750 
263 <400> SEQUENCE: 14 

265 Ala Arg Ala Gly His Thr Ala Leu Arg Arg Gly Gly Pro Asp Val Ala 

266 15 10 15 

269 Asp Pro Gin Ser Pro Ala Met His Arg Gin Arg Gly Asp Asp Arg Gly 

270 20 25 30 

273 Val Arg Arg Ala Ala Gly Gly Arg Gly Ser Ala Ala Val Ala Ala Gly 

274 . 35 40 45 
2 77 Arg Ala Gin 



281 <210> SEQ ID NO: 15 

282 <211> LENGTH: 35 

283 <212> TYPE: PRT 

284 <213> ORGANISM: Mycobacterium tuberculosis H37RV segment bp 3836701 to 3837750 
286 <400> SEQUENCE: 15 

288 Pro Gly Ser Ala Arg Asp Ala Gly Ala Gly Ala Val Thr Arg Thr Leu 

289 1 5 10 15 

2 92 His Leu Ala Tyr Pro Asp Thr Leu Ala Thr Arg Lys Gin Gly Gly Ala 
293 20 25 30 

296 Leu Glu Cys 

297 35 

300 <210> SEQ ID NO: 16 

301 <211> LENGTH: 4 

302 <212> TYPE: PRT 

303 <213> ORGANISM: Mycobacterium tuberculosis H37RV segment bp 3836701 to 3837750 
305 <400> SEQUENCE: 16 

307 His Ser His Val 

308 1 

311 <210> SEQ ID NO: 17 

312 <211> LENGTH: 51 

313 <212> TYPE: PRT 

314 <213> ORGANISM: Mycobacterium tuberculosis H37RV segment bp 3836701 to 3837750 
316 <400> SEQUENCE: 17 

318 Ser Ala Arg Trp Gin Ser Ala Asn Pro Cys Val Gly Thr Arg Asp Asp 

319 15 10 15 

322 Gly Ala Arg Ala Arg Thr Asn Ala Glu Ser Pro Gly Asn Ser Asp Gly 

323 20 25 30 

326 Ser Gly Ala Arg Pro Gly Pro Thr Ala Asn Ser Gly Pro Gly Glu Arg 

327 35 40 45 



278 



50 
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RAW SEQUENCE LISTING DATE: 05/09/2005 

PATENT APPLICATION: US/10/507 , 257 TIME: 07:58:22 

Input Set : A:\3170.1006-000 Sequence List .TXT 
Output Set: N:\CRF4\05092005\J507257.raw 

330 Pro Gly Leu 

331 50 

334 <210> SEQ ID NO: 18 

335 <211> LENGTH: 2 

336 <212> TYPE: PRT 

337 <213> ORGANISM: Mycobacterium tuberculosis H37RV segment bp 3836701 to 3837750 
339 <400> SEQUENCE: 18 

. 341 Ser Lys 
342 1 

345 <210> SEQ ID NO: 19 

346 <211> LENGTH: 105 

347 <212> TYPE: PRT 

348 <213> ORGANISM: Mycobacterium tuberculosis H37RV segment bp 3836701 to 3837750 



350 


<400> SEQUENCE: 


19 
















352 


Trp 


Arg 


Ala 


Pro 


He 


Val 


Ala 


Lys 


Val 


Asn 


He 


Lys Pro Leu Glu Asp 


353 


1 








5 










10 




15 


356 


Lys 


He 


Leu 


Val 


Gin 


Ala 


Asn 


Glu 


Ala 


Glu 


Thr 


Thr Thr Ala Ser Gly 


357 








20 










25 






30 


360 


Leu 


Val 


He 


Pro 


Asp 


Thr 


Ala 


Lys 


Glu 


Lys 


Pro 


Gin Glu Gly Thr Val 


361 






35 










40 








45 


364 


Val 


Ala 


Val 


Gly 


Pro 


Gly 


Arg 


Trp 


Asp 


Glu Asp 


Gly Glu Lys Arg He 


365 




50 










55 










60 


368 


Pro 


Leu 


Asp 


Val 


Ala 


Glu 


Gly 


Asp 


Thr 


Val 


He 


Tyr Ser Lys Tyr Gly 


369 


65 










70 










75 


80 


372 


Gly 


Thr 


Glu 


He 


Lys 


Tyr 


Asn 


Gly 


Glu 


Glu 


Tyr 


Leu He Leu Ser Ala 


373 










85 










90 




95 


376 


Arg 


Asp 


Val 


Leu 


Ala 


Val 


Val 


Ser 


Lys 








377 








100 










105 








380 


<210> SEQ ID NO: 


: 20 

















381 <211> LENGTH: 13 
3 82 <212> TYPE: PRT 

383 <213> ORGANISM: Mycobacterium tuberculosis H37RV segment bp 3836701 to 3837750 
385 <400> SEQUENCE: 20 

387 Ser Val Phe Arg Pro Gly Asp Pro Arg Ala His His Gly 

388 1 5 10 
391 <210> SEQ ID NO: 21 

3 92 <211> LENGTH: 9 

393 <212> TYPE: PRT 

394 <213> ORGANISM: Mycobacterium tuberculosis H37RV segment bp 3836701 to 3837750 
396 <400> SEQUENCE: 21 

3 98 Phe Pro Gly Arg His Ala Leu Ala Asp 
399 1 5 

402 <210> SEQ ID NO: 22 

403 <211> LENGTH: 63 

404 <212> TYPE: PRT 

405 <213> ORGANISM: Mycobacterium tuberculosis H37RV segment bp 3836701 to 3837750 
407 <400> SEQUENCE: 22 

409 Pro Cys Val Glu Glu Pro Asp Glu Gin Ala Asp Arg He Arg Arg Asn 

410 15 10 15 
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RAW SEQUENCE LISTING ERROR SUMMARY DATE: 05/09/2005 

PATENT APPLICATION: US/10/507,257 TIME: 07:58:23 

Input Set : A:\3170.1006-000 Sequence List. TXT 
Output Set: N:\CRF4\05092005\J507257.raw 

Invalid Line Length; 

The rules require that a line not exceed 72 characters in length. This includes spaces. 

Seq#:85; Line(s) 1338 
Seq#:86; Line(s) 1357 
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VERIFICATION SUMMARY DATE: 05/09/2005 

PATENT APPLICATION: US/10/507,257 TIME: 07:58:23 

Input Set : A:\3170.1006-000 Sequence List. TXT 
Output Set: N:\CRF4\05092005\J507257.raw 

L:ll M:271 C: Current Filing Date differs, Replaced Current Filing Date 
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